Lab:
Evaluating PROGRESA

Edward Vytlacil

Conditional Cash Transfer Programs

  • Government transfer of money
    • to households in extreme poverty,
    • conditional on household’s children being enrolled in school.
  • Goals:
    • reduce child labor,
    • increase human capital investments in children,
    • break “cycle of poverty.”

PROGRESA

  • Early CCT Program in Mexico.

  • Implemented as part of a RCT in 1997.

  • Experimental results were very positive.

  • See, e.g., Skoufias et al. (2001) (link), Schultz (2004) (link)

PROGRESA

  • Experimental results were very positive.

  • Led to rapid expansion in Mexico:

    • covered 2.6 million Mexican households in 50,000 villages by 2001,
    • renamed “Oportunidades” in 2002.

PROGRESA

  • Led to similar programs in:
    • Bangladesh, Brazil, Cambodia, Chile, Colombia, Egypt, Guatemala, Honduras, Indonesia, Jamaica, Nicaragua, Panama, Peru, Phillipines, Turkey and the United States.
    • See e.g. Attanasio et al. (2010) for Columbia.

PROGRESA Treatment: Cash transfer

  • Eligibility:
    • Extremely poor households,
    • living within poor, rural villages.
  • Conditional on:
    • school enrollment in grades 3 - 9,
      of child less than 18 years old.

PROGRESA Treatment: Cash transfer

Education Level of Student Monthly Payment
3rd Year 70
4th Year 80
5th Year 105
6th Year 135
7th Year Boys 200
Girls 210
8th Year Boys 210
Girls 235
9th Year Boys 225
Girls 255

Random Assignment at Village Level

flowchart TD
  A{495 &nbsp <br> Villages  &nbsp  <br> &nbsp} --> B{314 <br>  Treatment &nbsp &nbsp<br> Villages &nbsp &nbsp <br> &nbsp}
  A --> C{181  <br> Control  &nbsp &nbsp <br> Villages &nbsp &nbsp <br>&nbsp }
  B --> D[Eligible  Children &nbsp &nbsp <br> Treated &nbsp  &nbsp]
  B --> E(Noneligible Children,&nbsp  &nbsp  &nbsp &nbsp &nbsp <br> Not Treated &nbsp &nbsp)
  C --> F[Eligible  Children &nbsp &nbsp &nbsp <br> Controls &nbsp &nbsp]
  C --> G(Noneligible Children,&nbsp &nbsp  &nbsp &nbsp &nbsp <br> Not Treated &nbsp &nbsp)

RCT: 5 Survey Waves

  • Academic year pre-treatment: Aug 97 to July 98,
    • Wave 1: October 1998,
    • Wave 2: March 1998.
  • 1st academic year post-treatment: Aug 98 to July 99,
    • Wave 3: October 1998,
    • Wave 4: May 1999.
  • 2nd academic year post-treatment: Aug 99 to July 2000,
    • Wave 5: November 1999.

Analyze PROGRESA Data Using R

Libraries

Start by loading the following libraries. Install them first if you have not already done so.

  • stargazer for creating tables, by Hlavac (2022)
  • ggplot2 for making figures, by Wickham (2016)
  • plotly for making figures interactive, by Sievert (2020)
Code
library(stargazer)
library(ggplot2)
library(plotly)

Read in the data, explore the data

  • Use read.csv to read the data into a data.frame.
df <- read.csv("https://edward-vytlacil.github.io/Data/PROGRESA.csv",
                    header=TRUE, sep=",") # read csv

Explore the data

  • Explore data, make sure we understand variables, data structure, and look for any issues with the data.
  • Explore the data with R functions such as names, dim, head, table and summary.
names(df)
 [1] "wave"         "sooloca"      "indivill_seq" "villsize_t"   "progresa1"   
 [6] "sooind_id"    "sex1"         "age1"         "hgc1"         "poor1"       
[11] "school"      
dim(df)
[1] 74031    11

Explore the data

  • Explore the data with R functions such as names, dim, head, table and summary.
head(df)
  wave sooloca indivill_seq villsize_t progresa1 sooind_id sex1 age1 hgc1 poor1
1    1       1            1         76         1        18    0   10    1     1
2    1       1            2         76         1         5    0   11    2     1
3    1       1            3         76         1        30    0    9    2     1
4    1       1            4         76         1        42    1   14    3     1
5    1       1            5         76         1        14    1    7    0     1
6    1       1            6         76         1         8    0   12    2     1
  school
1      1
2      1
3      1
4      0
5      1
6      1

Explore the data

  • Explore the data with R functions such as names, dim, head, table and summary.
table(df$wave)

    1     2     3     4     5 
14996 12368 15455 15610 15602 
table(df$progresa1)

    0     1 
27723 46308 
table(df$hgc1)

    0     1     2     3     4     5     6     7     8     9 
 4669  8863  9685  9876  8926  8187 13296  3858  3049  3622 

Explore the data

  • Explore the data with R functions such as names, dim, head, table and summary.
summary(df)
      wave         sooloca     indivill_seq      villsize_t    
 Min.   :1.00   Min.   :  1   Min.   :  1.00   Min.   :  1.00  
 1st Qu.:2.00   1st Qu.:131   1st Qu.:  8.00   1st Qu.: 26.00  
 Median :3.00   Median :253   Median : 18.00   Median : 39.00  
 Mean   :3.06   Mean   :254   Mean   : 25.63   Mean   : 50.26  
 3rd Qu.:4.00   3rd Qu.:395   3rd Qu.: 34.00   3rd Qu.: 63.00  
 Max.   :5.00   Max.   :491   Max.   :210.00   Max.   :210.00  
   progresa1        sooind_id          sex1            age1      
 Min.   :0.0000   Min.   :    1   Min.   :0.000   Min.   : 0.00  
 1st Qu.:0.0000   1st Qu.: 3983   1st Qu.:0.000   1st Qu.: 9.00  
 Median :1.0000   Median : 7894   Median :0.000   Median :11.00  
 Mean   :0.6255   Mean   : 7877   Mean   :0.485   Mean   :11.33  
 3rd Qu.:1.0000   3rd Qu.:11768   3rd Qu.:1.000   3rd Qu.:14.00  
 Max.   :1.0000   Max.   :15669   Max.   :1.000   Max.   :99.00  
      hgc1           poor1       school      
 Min.   :0.000   Min.   :1   Min.   :0.0000  
 1st Qu.:2.000   1st Qu.:1   1st Qu.:1.0000  
 Median :4.000   Median :1   Median :1.0000  
 Mean   :4.029   Mean   :1   Mean   :0.8275  
 3rd Qu.:6.000   3rd Qu.:1   3rd Qu.:1.0000  
 Max.   :9.000   Max.   :1   Max.   :1.0000  

Histograms to Explore Data

ggplot(data = df, aes(x = as.factor(age1))) + geom_bar()+ xlab("age") 
ggplot(data = df, aes(x = as.factor(hgc1))) + geom_bar()+ xlab("hgc1") 

Histograms to Explore Data

ggplot(data = df, aes(x = as.factor(hgc1))) + geom_bar()+
  facet_wrap(~age1)+ xlab("highest grade completed")+ 
  ggtitle("HGC by Age")

Histograms to Explore Data

ggplot(data = df[df$age1>=6 & df$age1<=16, ] , aes(x = as.factor(hgc1))) + geom_bar()+ facet_wrap(~age1)+
  xlab("highest grade completed")+  ggtitle("HGC by Age (6-16)")

Explore the data

  • poor==1 for all observations, so that all observations are poor enough to be eligible for PROGRESA.
  • The youngest child is 0 years old, while the oldest child is 99 years old.
    • Possibly data entry error? such data problems very common.

Examine Unusual Age Observations

  • Investigating age distribution further. . .
sum(df$age1<6)
[1] 1
sum(df$age1>=6 & df$age1<=18)
[1] 73769
sum(df$age1>18)
[1] 261
  • While vast majority of rows have an age between 6 and 18,
    • 1 row has child less than 6,
    • 261 rows have child older than 18.

Examine Unusual Age Observations

Lets investigate the child with the reported age of 0.

df[df$age1==0,"sooind_id"]
[1] 1586
df[df$sooind_id==1586,c("sooind_id","wave","age1","hgc1")]
      sooind_id wave age1 hgc1
1418       1586    1    7    1
28916      1586    3   99    1
44394      1586    4    0    1
60001      1586    5    7    1

Examine Unusual Age Observations

  • The extreme ages appear to be data entry errors.

  • We could try to impute them, for example, impute that the previous child is age \(7\) in waves \(3\) and \(4\) based on his ages in waves \(1\) and \(5\).

  • Instead, we will set to missing.

Number of Observations

  • Data has 74031 rows (child-wave observations).
nrow(df)
[1] 74031
  • We do not have 74031 independent observations.

  • How many child-observations are in the data?

Number of Observations

  • Data has 74031 rows (child-wave observations).

  • How many child-observations are in the data?

    • Child id is sooind_id.

    • Use length and unique functions to find that there are 15669 child-observations in the data, of which 9799 are treated and 5870 are control.

length(unique(df$sooind_id))  
[1] 15669
length(unique(df[df$progresa1==1,"sooind_id"]))
[1] 9799
length(unique(df[df$progresa1==0,"sooind_id"]))
[1] 5870

Number of Observations

  • Data has 74031 rows (child-wave observations).

  • There are 15669 child-observations in the data, of which 9799 are treated and 5870 are control.

    • Child within the same village (and certainly in same family) may not be independent observations.
  • How many villages?

Number of Observations

  • Data has 74031 rows (child-wave observations).

  • There are 15669 child-observations in the data, of which 9799 are treated and 5870 are control.

  • How many villages?

    • Village id is sooloca.

    • Use length and unique functions to find that there are 491 villages, of which 308 are treated and 183 are control.

length(unique(df$sooloca))
[1] 491
length(unique(df[df$progresa1==1,"sooloca"]))
[1] 308
length(unique(df[df$progresa1==0,"sooloca"]))
[1] 183

Preparing data for analysis

  • Change name of variables for convenience:
    • progresa1 to treat,
    • sex1 to girl .
  • Using the subset function, restrict data to first wave (first baseline wave) and fourth wave (second post-treatment wave) and keep only relevant variables.
df$treat <- df$progresa1 
df$girl <- df$sex1 
df<-subset(df, wave==4 | wave==1,
           select= c(sooloca,sooind_id,age1,hgc1,school,treat,girl,wave))

Preparing data for analysis

  • Following Schultz (2004) create balanced sample, and additionally restrict to children aged 6 to 16 in first wave.
balanced_ids <- intersect(df[df$wave==4,"sooind_id"],
                  df[df$wave==1 & df$age1>=6 & df$age1<=16,"sooind_id"])
df2 <- df[df$sooind_id %in% balanced_ids,]

Preparing data for analysis

  • How many observations do we lose for balanced sample?

  • In full analysis, should redo exploratory analysis on balanced sample, compare to full sample.

nrow(df[df$wave==1,])
[1] 14996
nrow(df[df$wave==4,])
[1] 15610
nrow(df2[df2$wave==1,])
[1] 14996
nrow(df2[df2$wave==4,])
[1] 14996

Preparing data for analysis

  • Using the ifelse command, create an indicator variable for whether an observation is from the post period, i.e., a variable that equals 1 if the observation is from waves 4.

  • Create separate data frames for pre and post treatment observations.

df2$post <- ifelse(df2$wave==4, 1, 0)  
dfPre <- df2[df2$post==0,]  
dfPost <- df2[df2$post==1,]  

Preparing data for analysis

  • For more involved “data wrangling” with R, see either:
    • R Workflow by Frank Harrell, approach based on HMISC.
    • R for Data Science by Hadley Wickham and Garret Grolemund, approach based on tidyverse.

Balance at Baseline: Summary Table

  • Produce table examining balance in covariates between treated and control at baseline.

  • By randomization, no systematic differences in expectation.

  • However, some differences could arise by random chance.

Balance at Baseline
Control Treated Diff. Std Diff
Girl 0.49 0.48 -0.01 -0.03
Age 10.35 10.33 -0.02 -0.01
Highest Grade 3.41 3.44 0.03 0.01
Enrolled 0.86 0.86 0.001 0.003

Balance at Baseline: Summary Table

  • Standard to report table examining balance in baseline characteristics.
    • often including t-tests of null of equal means.
    • See Bruhn and McKenzie (2009) Section G.
Balance at Baseline
Control Treated Diff. Std Diff
Girl 0.49 0.48 -0.01 -0.03
Age 10.35 10.33 -0.02 -0.01
Highest Grade 3.41 3.44 0.03 0.01
Enrolled 0.86 0.86 0.001 0.003

Balance at Baseline: Summary Table

  • Researchers sometimes re-randomize if baseline not sufficiently balanced.
    • Does typically improve precision (similar to proper linear regression adjustment for baseline characteristics)
    • Invalidates conventional standard errors, inference.
Balance at Baseline
Control Treated Diff. Std Diff
Girl 0.49 0.48 -0.01 -0.03
Age 10.35 10.33 -0.02 -0.01
Highest Grade 3.41 3.44 0.03 0.01
Enrolled 0.86 0.86 0.001 0.003
  • See, e.g., Li, Ding, and Rubin (2018), Lin (2013), and Negi and Wooldridge (2021) .

  • Rerandomization is inferior to other methods to design the experiment or to adjust for differences at baseline. See, e.g., Bai (2022) and Cytrynbaum (2021)

Balance at Baseline: Summary Table

  • Code to produce previous table:
return_mean_by_treatment <- function(x){
  means.t<-tapply(x,dfPre$treat,mean)
  var.t<-tapply(x,dfPre$treat,var)
  return(c(means.t,var.t))
  }

vars <- c("girl","age1","hgc1","school")
output <- sapply(dfPre[vars],return_mean_by_treatment)

means <- output[1:2,]
diffs <- output[2,]-output[1,]
N1 <- sum(dfPre$treat)
N0 <- sum(1-dfPre$treat)
pooled.sd <- sqrt(((N0-1)*output[3,]+(N1-1)*output[4,])/(N0+N1-2))
std.diffs <- (output[2,]-output[1,])/pooled.sd 
results0 <- t(rbind(means,diffs,std.diffs))

colnames(results0)<-c("Control","Treated","Diff.","Std Diff")
varlabels <- c("Girl","Age","Highest Grade","Enrolled")
rownames(results0)<-c(varlabels)

stargazer(results0,  type="html", digits=2, title="Balance at Baseline")

Baseline: Enrollment by Grade, Sex

Baseline: Enrollment by Grade, Sex

  • Code to produce figure for boys:
MeanSchC.b  <-  with(subset(dfPre, treat==0 & girl==0),tapply(school, hgc1, mean))
MeanSchT.b <-  with(subset(dfPre, treat==1 & girl==0),tapply(school, hgc1, mean))

Grade <- as.factor(rep(c(1:10),2))
Group <- c(rep(" Control", 10), rep("Treated", 10))
MeanSch.b <- matrix(c(MeanSchC.b,MeanSchT.b), nrow = 20, ncol = 1)
tab.b <- data.frame(Grade, Group, MeanSch.b)
 
plotPre.b <- ggplot(tab.b, aes(x = Grade, y = MeanSch.b, fill = Group)) +
  geom_col(width = 0.7, position = position_dodge(width=0.8)) +
  theme_bw(base_size = 11) +
  theme(legend.position = "bottom", legend.title = element_blank()) +
  scale_y_continuous(breaks = seq(from = 0, to = 1, by = 0.1)) +
  xlab("Grade Level") +
  ylab("Mean School Enrollment")+
  ggtitle("Boys: School Enrollment by Grade, Pre Period")
 
ggplotly(plotPre.b,tooltip="y")  

Using Post Data to Estimate Effect

Using mean differences on post-treatment data to estimate effect of PROGRESA on school enrollment:

  • Overall: 0.039,

  • By sex:

    • For boys: 0.033,

    • For girls: 0.046.

 mean(dfPost[ dfPost$treat==1,"school"]) -  mean(dfPost[ dfPost$treat==0,"school"])
[1] 0.03923947
mean(dfPost[dfPost$treat==1&dfPost$girl==0,"school"])-mean(dfPost[dfPost$treat==0&dfPost$girl==0,"school"])
[1] 0.03280379
mean(dfPost[dfPost$treat==1&dfPost$girl==1,"school"])- mean(dfPost[dfPost$treat==0&dfPost$girl==1,"school"])
[1] 0.045535

Using Post Data to Estimate Effect

Using mean differences on post-treatment data to estimate effect of PROGRESA on school enrollment:

  • Overall: 0.039,

  • By sex:

    • For boys: 0.033,

    • For girls: 0.046.

  • Mean difference estimator justified by randomization, no selection bias.

  • Can take mean difference conditional on any covariate not effected by the treatment, including any baseline characteristic.

Estimate Effect by Grade, Sex

Code
MeansC <- with(dfPost,tapply(school,list(treat,girl,hgc1),mean))

dimnames(MeansC)[[1]] <-c("Control","Treated")
dimnames(MeansC)[[2]] <-c("Boys","Girls")

EffSchBoy <- (MeansC[2,1,] - MeansC[1,1,]) 
EffSchGirl <- (MeansC[2,2,] - MeansC[1,2,]) 

Grade <- as.factor(c(1:10, 1:10))
EffSch <- matrix(c(EffSchBoy, EffSchGirl), nrow = 20, ncol = 1)
sex <- c(rep("Boy", 10), rep("Girl", 10))
tabEff <- data.frame(Grade, sex, EffSch)

plotEff <- ggplot(tabEff, aes(x = Grade, y = EffSch, fill = sex)) +
  geom_col(width = 0.7, position = position_dodge(width=0.8)) +
  theme_bw(base_size = 10) +
  theme(legend.position = "bottom", legend.title = element_blank()) +
  scale_y_continuous(breaks = seq(from = -0.1, to = 0.1, by = 0.05)) +
  xlab("Grade Level") +
  ylab("Mean School Enrollment") +
  ggtitle("Estimated Treatment Effects on Enrollment by Grade and Sex")

# ggplotly(plotEff,tooltip="y")
plotEff

Adjust for differences at baseline?

Diff-in-diff estimates:
\[\begin{multline*}(\bar{Y}_{\mbox{Post},\mbox{T}}-\bar{Y}_{\mbox{Post},\mbox{C}}) -(\bar{Y}_{\mbox{Pre},\mbox{T}}-\bar{Y}_{\mbox{Pre},\mbox{C}}).\end{multline*}\]

  • Justification? Properties?

Diff-in-Diff

Estimates:

f.did <- function(d){
  tab0 <- with(d,tapply(school,list(treat,post),mean))
  did <- (tab0[2,2]-tab0[1,2])-(tab0[2,1]-tab0[1,1])
  round(did,3)
}
  • Boys: f.did(df2[df2$girl==0,]) 0.034

  • Girls: f.did(df2[df2$girl==1,]) 0.042

Diff-in-Diff

Precision?

  • Some results surprising, e.g., some negative estimated effects.

  • Random sampling noise?

    • With i.i.d. data, \(\mbox{Var}(\bar{X}_N)=\frac{\sigma^2}{N}\), where \(\sigma^2\) is the variance of each \(X_i\).

    • The higher the number of observations, the lower the sampling noise.

    • Should we think of sample size as number of children in a sex, treatment status, grade cell? number of villages in a cell?

No. of Obs. by Grade-Sex-Treatment

Code
person.counts <- with(dfPost,tapply(sooind_id,list(treat,girl,hgc1),function(x){length(unique(x))}))
dimnames(person.counts)[[1]] <-c("Control","Treated")
dimnames(person.counts)[[2]] <-c("Boys","Girls")
 
HGC <- as.factor(rep(c(0:9),2))
Group <- c(rep(" Control", 10), rep("Treated", 10))
numbers.boys <- matrix(c(person.counts[1,1,],person.counts[2,1,]), nrow = 20, ncol = 1)
numbers.girls<- matrix(c(person.counts[1,2,],person.counts[2,2,]), nrow = 20, ncol = 1)

tab.boys <- data.frame(HGC, Group, numbers.boys)
tab.girls <- data.frame(HGC, Group, numbers.girls)

plot.n.boys <- ggplot(tab.boys, aes(x = HGC, y = numbers.boys, fill = Group)) +
  geom_col(width = 0.7, position = position_dodge(width=0.8)) +
  theme_bw(base_size = 11) +  
  theme(legend.position = "bottom", legend.title = element_blank()) +
  scale_y_continuous(limits=c(0,1100),breaks = seq(from = 0, to = 1100, by = 100)) +
  xlab("HGC") +
  ylab("Number of Boy Observations")+
  ggtitle("Number of Boy Observations, Treated and Control")



plot.n.girls <- ggplot(tab.girls, aes(x = HGC, y = numbers.girls, fill = Group)) +
  geom_col(width = 0.7, position = position_dodge(width=0.8)) +
  theme_bw(base_size = 11) +
  theme(legend.position = "bottom", legend.title = element_blank()) +
 scale_y_continuous(limits=c(0,1100),breaks = seq(from = 0, to = 1100, by = 100)) +
  xlab("HGC") +
  ylab("Number of Girl Observations")+
  ggtitle("Number of Girl Observations, Treated and Control")

  
village.counts <- with(dfPost,tapply(sooloca,list(treat,girl,hgc1),function(x){length(unique(x))}))
dimnames(village.counts)[[1]] <-c("Control","Treated")
dimnames(village.counts)[[2]] <-c("Boys","Girls")


numbers.boys.v <- matrix(c(village.counts[1,1,],village.counts[2,1,]), nrow = 20, ncol = 1)
numbers.girls.v<- matrix(c(village.counts[1,2,],village.counts[2,2,]), nrow = 20, ncol = 1)

tab.boys.v <- data.frame(HGC, Group, numbers.boys.v)
tab.girls.v <- data.frame(HGC, Group, numbers.girls.v)

plot.n.boys.v <- ggplot(tab.boys.v, aes(x = HGC, y = numbers.boys.v, fill = Group)) +
  geom_col(width = 0.7, position = position_dodge(width=0.8)) +
  theme_bw(base_size = 11) + 
  theme(legend.position = "bottom", legend.title = element_blank()) +
 scale_y_continuous(limits=c(0,1100),breaks = seq(from = 0, to = 1100, by = 100)) + 
  xlab("HGC") +
  ylab("Number of Villages with Boy Observations")+
  ggtitle("Number of Villages with Boy Observations")



plot.n.girls.v <- ggplot(tab.girls.v, aes(x = HGC, y = numbers.girls.v, fill = Group)) +
  geom_col(width = 0.7, position = position_dodge(width=0.8)) +
  theme_bw(base_size = 11) +
  theme(legend.position = "bottom", legend.title = element_blank()) +
 scale_y_continuous(limits=c(0,1100),breaks = seq(from = 0, to = 1100, by = 100)) + 
  xlab("HGC") +
  ylab("Number of Villages with Girl Observations")+
  ggtitle("Number of Villages with Girl Observations")

Results?

  • Estimated effects:

    • positive overall;

    • larger for girls than for boys;

    • vary widely across sex-grade cells.

  • To what degree are these results real, versus driven by sampling noise?

  • Next time: hypothesis testing!

References

Attanasio, Orazio, Emla Fitzsimons, Ana Gomez, Martha Isabel Gutierrez, Costas Meghir, and Alice Mesnard. 2010. “Children’s Schooling and Work in the Presence of a Conditional Cash Transfer Program in Rural Colombia.” Economic Development and Cultural Change 58 (2): 181–210.
Bai, Yuehao. 2022. “Optimality of Matched-Pair Designs in Randomized Controlled Trials.” arXiv Preprint arXiv:2206.07845.
Bruhn, Miriam, and David McKenzie. 2009. “In Pursuit of Balance: Randomization in Practice in Development Field Experiments.” American Economic Journal: Applied Economics 1 (4): 200–232. https://www.jstor.org/stable/25760187.
Cytrynbaum, Max. 2021. “Designing Representative and Balanced Experiments by Local Randomization.” arXiv Preprint arXiv:2111.08157.
Grolemund, Garrett. 2014. Hands-on Programming with r: Write Your Own Functions and Simulations. " O’Reilly Media, Inc.".
Hlavac, Marek. 2022. “Stargazer: LaTeX Code and ASCII Text for Well-Formatted Regression and Summary Statistics Tables” R package version 5.2.3. https://CRAN.R-project.org/package=stargazer.
(IFPRI), International Food Policy Research Institute. 2018. Mexico, Evaluation of PROGRESA.” Harvard Dataverse. https://doi.org/10.7910/DVN/05BMJY.
Lee, Soohyung, and Azeem M Shaikh. 2014. “Multiple Testing and Heterogeneous Treatment Effects: Re-Evaluating the Effect of Progresa on School Enrollment.” Journal of Applied Econometrics 29 (4): 612–26. https://home.uchicago.edu/~amshaikh/webfiles/progresa.pdf.
Li, Xinran, Peng Ding, and Donald B Rubin. 2018. “Asymptotic Theory of Rerandomization in Treatment–Control Experiments.” Proceedings of the National Academy of Sciences 115 (37): 9157–62.
Lin, Winston. 2013. “Agnostic Notes on Regression Adjustments to Experimental Data: Reexamining Freedman’s Critique.” The Annals of Applied Statistics 7 (1): 295–318.
Negi, Akanksha, and Jeffrey M Wooldridge. 2021. “Revisiting Regression Adjustment in Experiments with Heterogeneous Treatment Effects.” Econometric Reviews 40 (5): 504–34.
Schultz, T Paul. 2004. “School Subsidies for the Poor: Evaluating the Mexican Progresa Poverty Program.” Journal of Development Economics 74 (1): 199–250.
Sievert, Carson. 2020. Interactive Web-Based Data Visualization with r, Plotly, and Shiny. Chapman; Hall/CRC. https://plotly-r.com.
Skoufias, Emmanuel, Susan W Parker, Jere R Behrman, and Carola Pessino. 2001. “Conditional Cash Transfers and Their Impact on Child Work and Schooling: Evidence from the Progresa Program in Mexico [with Comments].” Economia 2 (1): 45–96. https://www.jstor.org/stable/20065413.
Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.